Skip to content

Conversation

@shanbady
Copy link
Contributor

@shanbady shanbady commented Feb 6, 2025

What are the relevant tickets?

Closes https://github.com/mitodl/hq/issues/6670

Description (What does it do?)

This PR makes it so that the embedding generation for both learning resources and contentfiles skips over existing embeddings by default. We can force re-embed by passing the --overwrite flag to the generate_embeddngs command.

How can this be tested?

  1. checkout this branch
  2. restart celery
    3.get the id of learning resource that has associated contentfiles and pass it into the generate_embeddings command python manage.py generate_embeddings --resource-ids <the id>
  3. observe the celery container's output - there should be some output indicating that embeddings are being generated (different output depending on if fastembed or ollama via litellm is used).
  4. Once that completes, verify there are embeddings on the qdrant dashboard
  5. while keeping an eye on the celery output, re-run the embedding command
  6. note that embeddings are not skipped
  7. re-run the command with the overwrite flag python manage.py generate_embeddings --resource-ids <the id> --overwrite and note that once again embeddings are getting generated

@shanbady shanbady changed the title Shanbady/skip existing cf embeddings Skip existing embeddings Feb 6, 2025
@shanbady shanbady added Work in Progress Needs Review An open Pull Request that is ready for review and removed Work in Progress labels Feb 6, 2025
@shanbady shanbady marked this pull request as ready for review February 6, 2025 19:27
@rhysyngsun rhysyngsun assigned rhysyngsun and unassigned rhysyngsun Feb 6, 2025
@mbertrand mbertrand self-assigned this Feb 7, 2025
Copy link
Member

@mbertrand mbertrand left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 works great

@mbertrand mbertrand added Waiting on author and removed Needs Review An open Pull Request that is ready for review labels Feb 7, 2025
@shanbady shanbady merged commit 64f5c9a into main Feb 8, 2025
11 checks passed
@shanbady shanbady deleted the shanbady/skip-existing-cf-embeddings branch February 8, 2025 18:45
@odlbot odlbot mentioned this pull request Feb 10, 2025
12 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants